15 research outputs found

    Generalized topographic block model

    No full text
    Co-clustering leads to parsimony in data visualisation with a number of parameters dramatically reduced in comparison to the dimensions of the data sample. Herein, we propose a new generalized approach for nonlinear mapping by a re-parameterization of the latent block mixture model. The densities modeling the blocks are in an exponential family such that the Gaussian, Bernoulli and Poisson laws are particular cases. The inference of the parameters is derived from the block expectation–maximization algorithm with a Newton–Raphson procedure at the maximization step. Empirical experiments with textual data validate the interest of our generalized model

    Visualization of generalized mean estimators using auxiliary information in survey sampling

    No full text
    In this communication, it is proposed a generalized method for modeling mean estimators. The mean estimators depend on multiple auxiliary variates and unknown parameters in a finite population setting. Our approach brings naturally a graphical analysis for comparing and improving mean estimators

    Probabilistic Elastic Embedding Model: Comparison of Alternative Models

    No full text
    In data visualization, Elastic Embedding adds an exponential penalty to an Euclidean criterion. It is able to separate the natural classes but its lacks a probabilistic generative setting which brings more flexibility to the modeling and the inference. Hence, it is proposed a new generative interpretation of Elastic Embedding which is closely related to LargeVis. Numerical experiments compare the proposed model and several alternative ones via two new visual indicators among different approaches

    Sparse and reduced-rank family of generalized regressions with transformation from pca or autoencoder

    No full text
    Linear regression is one of the most studied methods after descriptive statistics, and univariate tests because it aims at understanding a target variable as a function of explaining or predictive variables. The interest in pca regression is how to improve the estimation of the output by new algorithms reducing the design matrix, also relevant for generalized linear models. By an association of several criteria, a family of new objective functions is proposed and the results are compared with pca regression and regression

    Visualization of generalized mean estimators using auxiliary information in survey sampling: additive case

    No full text
    The mean estimators with ratio depend on multiple auxiliary variables and unknown parameters in a finite population setting. Recently a new generic approach for modeling multivariate mean estimators with matrices has been proposed in order to compute automatically their minimum mean squared error. This brings naturally a graphical analysis for comparing mean estimators via nonlinear curves of their approximated mean squared error or their bias. Herein generalized additive ratio estimators with two auxiliary variables and higher order expansions in the approximations are proposed. This is just after a brief review of the new generic method, with an extension to constrained parameters. This leads to complete the main matrix in stake with higher-order moments of the auxiliary and target variables while keeping an underlying regression model for the optimization. A perspective is the visualization of alternative models under this framework when empirical means are associated with ratio functions of auxiliary variables

    Méthodes de carte auto-organisatrice par mélange de lois contraintes. Application à l'exploration dans les tableaux de contingence textuels

    No full text
    This thesis is concerned with exploratory analysis of multidimensional data, which are often qualitative or textual, in particular Kohonen's self-organizing map models. The goal is to cluster and project simultaneously lines or columns of a data matrix. The result of these methods is a reduction in the form of a discrete surface of regression. We study more precisely mixture models of probabilistic laws: the parameters corresponding to means of clustered vectors are constrained by setting them at the nodes of a rectangular mesh. After an overview of these methods, and of the learning algorithms based on EM (Expectation - Maximization), we introduce two new approaches. The first one aims at generalizing the Correspondence Analysis method to large matrices: the CASOM algorithm is a naive Bayes classifier, which is constrained as a TPEM (Topology Preserving EM) for a contingency table. The second one consists in mutating image-clustering algorithms into map algorithms. As an illustration, we modify a clustering algorithm based on mean-field, and we get an algorithm named TNEM. We use these methods to ease the navigation in a textual corpus. Indeed, we provide objective criteria and cartographies.Cette thèse d'intéresse à l'analyse exploratoire des données multimdimensionnelles souvent qualitatives voire textuelles par des modèles particuliers de carte auto-organisatrice de Kohonen. Il s'agit d'effectuer une classification et une projection simultanées des lignes ou colonnes d'une matrice de données. Le résultat de ces méthodes est une réduction sous la forme d'une surface de régression discrète. Nous étudions plus particulièrement les modèles de mélange de lois de probabilité : les paramètres correspondant aux espérances des vecteurs classés sont contraints en les plaçant aux nœuds d'une grille rectangulaire. Après une présentation de ces méthodes, et des algorithmes d'estimation basés sur l'EM (Expectation - Maximization), nous introduisons essentiellement deux nouvelles approches. La première vise à "généraliser la méthode d'Analyse Factorielle des Correspondances" aux grandes matrices : l'algorithme CASOM est un classifieur naïf de Bayes contraint en un TPEM (Topology Preserving EM) pour tableau de contingence. La seconde consiste en un schéma général d'adaptation des méthodes de segmentation d'image en carte auto-organisatrice. Pour l'illustrer, nous modifions un algorithme de segmentation par champs moyens, et obtenons un algorithme appelé TNEM. Nous utilisons ces méthodes pour aider à la navigation dans un corpus textuel. En effet, nous aboutissons à des critères et des moyens de représentation objectifs

    Symmetric generative methods and tSNE: a short survey

    No full text
    In data visualization, a family of methods is dedicated to the symmetric numerical matrices which contain the distances or similarities between high-dimensional data vectors. The method t-Distributed Stochastic Neighbor Embedding and its variants lead to competitive nonlinear embeddings which are able to reveal the natural classes. For comparisons, it is surveyed the recent probabilistic and model-based alternative methods from the literature (LargeVis, Glove, Latent Space Position Model, probabilistic Correspondence Analysis, Stochastic Block Model) for nonlinear embedding via low dimensional positions.</p

    Estimation of correlations between cross-sectional estimates from repeated surveys: an application to the variance of change

    No full text
    Measuring change over time is a central problem for many users of social, economic and demographic data and is of interest in many areas of economics and social sciences. Smith et al. (2003) recognised that assessing change is one of the most important challenges in survey statistics. The primary interest of many users is often in changes or trends from one time period to another. A common problem is to compare two cross-sectional estimates for the same study variable taken on two different waves or occasions, and to judge whether the observed change is statistically significant. This involves the estimation of the sampling variance of the estimator of change. Estimation of variance of change would be relatively straightforward if cross-sectional estimates were based upon the same sample. Unfortunately, samples from different waves are usually not completely overlapping sets of units, because of rotations used in repeated surveys. This implies that crosssectional estimates are not independent. Correlation plays an important role in estimating the variance of a change between the cross-sectional estimates. The unbiasedness of an estimator of a correlation is crucial, because a small bias can significantly over-estimate or under-estimate the variance of change (Berger, 2004). Several methods can be used to estimate correlations, some of which use re-sampling and/or Taylor linearization. We propose to use a multivariate linear regression approach to estimate the correlation. The proposed estimator is not a model-based estimator, as this estimator is valid even if the model does not fit the data. We show that the regression approach gives design-consistent estimator for the correlation when the finite population corrections are negligible. We show how the proposed estimator can accommodate stratified and two-stage sampling designs. We also show how the proposed estimator can be used for estimator of correlation between complex estimators of change

    Probabilistic Enhanced Mapping with the Generative Tabular Model

    No full text
    International audienc
    corecore